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Abstract. In this paper, we propose a decomposition approach for eigenvalue problems 
with spatial symmetries, including the formulation, discretization as well as implementation. 
In the formulation, the original problem is divided into a set of subproblems and only a smaller 
number of eigenpairs is required for each subproblem. This formulation can handle eigenvalue 
problems with either Abelian or non-Abelian symmetries, and is easy for grid-based discretiza- 
tions such as finite difference, finite element or finite volume methods. We implement the 
decomposition approach with finite elements and parallelize our code in two levels. We ana- 
lyze and illustrate that the decomposition approach can improve the efficiency and scalability 
of iterative diagonalization. In particular, we apply it to Kolm-Sham equations of symmetric 
molecules consisting of several hundreds of atoms. 

Keywords. Eigenvalue problem, Abelian and non-Abelian symmetry. Group theory. Symmetry- 
adapted linear combination. Grid-based discretizations. Two-level parallelism. 

1 Introduction 

Many problems in physics, chemistry and engineering lead to eigenvalue problems of the form 



subject to some boundary condition, where n G {1,2,3} and L is an Hermitian operator. The 
number of required eigenvalues and eigenfunctions, Nf., could be very large in some applications. 
For example, in quantum chemistry (see, e.g., [11, 50] and references cited therein), operator L 
in (1.1) might be the Hamiltonian of a molecular system, and Ng increases with to the number 
of valence electrons in the system. 

Numerical computations of eigenvalues and associated eigenfunctions are usually necessary. 
It can be very expensive since the computational cost grows in proportion to N^N, where 
N is size of the discretization. Thus a decomposition of the problem over domain or over 
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eigenvalues is in insistent demand. While it is challenging to decompose eigenvalue problems 
since they have an intrinsic nonlinearity and can be regarded as global optimization problems 
with orthonormal constraints. Many kinds of domain decomposition methods derived from 
boundary value problems do not work effectively for eigenvalue problems. 

Symmetry provides a way to do decomposition. Eigenfunctions can be classified into orthog- 
onal categories according to group representation theory. And each category of eigenfunctions 
can be solved in a subdomain of il.. Symmetry occurs in everyday life, see, e.g., [8] for a 
comprehensive illustration of the ubiquitous role of symmetry in nature, music, architecture, 
and sciences. Mathematically, each symmetry corresponds to an operator, such as a reflection, 
a rotation or an inversion, that leaves the object or problem invariant. The classification of 
eigenfunctions comes from their distinct behaviors under the application of symmetry operators. 

In quantum physics and quantum chemistry, the so-called symmetry-adapted linear com- 
binations (SALCs) are constructed from specific basis functions like atomic orbitals, internal 
coordinates of a molecule, or orthogonalized plane waves [8, 14, 15]. Under the SALCs, matrix 
representation of the problem is block diagonal. Projection operators defined by group theory 
determine the proper way to combine the original basis functions into SALCs. In [15], the au- 
thor has given a case-by-case illustration of the way to construct SALCs from atomic orbitals, 
from which can be seen that the construction of SALCs is not an easy task. 

Grid-based discretizations, such as finite difference, finite element and finite volume meth- 
ods, are widely used in scientific and engineering computations [3, 4, 12, 17, 25]. For instance, 
the finite element method is important in the discretization of eigenvalue problems in struc- 
tural analysis [7, 21, 28]. In the last two decades, grid-based discretization approaches have 
been successfully applied to modern electronic structure calculations, see [6, 16, 40, 45] and 
reference cited therein. In particular, they have been proven to be well accommodated to peta- 
scale computing by treating extremely large-scale eigenvalue problems arising from the electron 
structure calculations [26, 29]. 

Grid-based discretizations have good locality and usually result in a large number of degrees 
of freedom. And finite difference methods do not have basis functions in the classical sense. 
These facts increase the numerical difficulty to construct SALCs. In this work, we focus on 
how to use Abelian and non-Abelian symmetries for grid-based discretizations. We propose a 
decomposition formulation easy for grid-based discretizations, which does not need the explicit 
construction of SALCs. The original eigenvalue problem is decomposed into a set of subproblems 
characterized by distinct conditions derived from group representation theory. We also give the 
discretized subproblems for Abelian and non-Abelian symmetries. For the purpose of comparing 
our approach with constructing SALCs, we provide mathematical guidance to the construction 
of SALCs, and then illustrate the equivalence between them by deducing the exact relation of 
our discretized eigenvalue problem to that formed by SALCs. 

We implement the decomposition approach using finite element methods. Subproblems 
corresponding to different irreducible representations can be solved independently. Accordingly, 
our code is is parallelized in two levels, including a fundamental level of spatial parallelization 
and another level of subproblem distribution. We apply the approach to solving the Kohn- 
Sham equation of some cluster systems with symmetries. It is indicated that the decomposition 
approach would be appreciable for large-scale eigenvalue problems. We should mention that 
the implementation techniques can be adapted to finite difference and finite volume methods. 

We should mention that Bossavit [9, 10] has employed group theory to decompose boundary 
value problems arising from structural analysis. Note that the design and analysis of decompo- 
sition methods for eigenvalue problems is different from boundary value problems, and [5, 10] 
have pointed out the programming difficulty in the implementation of using symmetries. Kro- 



2 



nik and his collaborators [32] have utilized Abelian symmetries in their finite difference codes to 
simplify the solving of Kohn-Sham equations. However, their implementation is only applicable 
to Abelian symmetry groups in which any two symmetry operators are commutative. In some 
plane- wave softwares of electronic structure calculations, symmetries are used to simplify the 
solving of Kohn-Sham equations by reducing the number of A;-points to the irreducible Brillouin 
zone (IBZ). Two /c-points related by some symmetry operation are considered equivalent. For a 
given fc-point, they do not classify the eigenfunctions and thus still solve the original eigenvalue 
problem. 

The rest of this paper is organized as follows. In Section 2, we prove the symmetry-based 
decomposition of eigenvalue problems, and propose a subproblem formulation proper for grid- 
based discretizations. Then in Section 3, we give matrix eigenvalue problems derived from 
the subproblem formulation and deduce the relation to those formed by SALCs, for which 
purpose we abstract the construction of SALCs mathematically. We quantize the decrease in 
computational cost when using the decomposition approach in Section 4. And in Section 5 we 
present some critical implementation issues. In Section 6, we give numerical examples to validate 
our implementation for Abelian and non- Abelian symmetry groups and show the reduction in 
computational and communicational overhead; then we apply the decomposition approach to 
the Kohn-Sham equation in electronic structure calculations. Finally some concluding remarks 
are given. 

2 Decomposition formulation 

In this section, we recall some basic but useful results of group theory and reformulate the 
symmetry-based decomposition for eigenvalue problems to achieve Theorem 2.1. Some notation 
and concept will be given in Appendix A. 

2.1 Representation, basis function, and projection operator 

We start the discussion from orthogonal coordinate transformations in M*^ such as a rotation, 
a reflection or an inversion, that form a finite group G of order g. Denote the function space 
to be considered as V. Each R & G corresponds to an operator Pr on function / € F as 

PRf{Rx) = f{x) VxeM". 

It can be proved that {Pr : R € G} form a group isomorphic to G. 

A matrix representation of group G means a group of matrices which is homomorphic to G. 
Any matrix representation with nonvanishing determinants is equivalent to a representation by 
unitary matrices (referred to as unitary representation). In the following we focus on unitary 
representations of group G. 

The great orthogonality theorem (cf. [14, 30, 44, 49]) tells that, all the inequivalent, irre- 
ducible, unitary representations {F^^^} of group G satisfy 

^ T^''\R)llT^'''\R)m'l' = Suu'Smn^'Su'-^, (2.1) 
R&G 

where dp denotes the dimensionality of the z^-th representation F^*^), T^'^\R) = (T^'^\R)mi) 

and V^^\R)*^i is the complex conjugate of T^'^\R)mi. The number of all the inequivalent, 
irreducible, unitary representations is equal to the number of classes in G. We denote this 
number as ric- 
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Definition 2.1. For a given r^'^\ non-zero functions {cl)i^\ 4>2^ , • • • , 0^^^} C V are said to form 
a basis for T^'^^ if for any I € {1, 2, . . . , dy} 



Pr4"^ = ^^l:^r('^\RU Vi?GG. (2.2) 

m=l 



(f)f'^ is called to belong to the l-th column ofT^^^ (or adapt to the u-l symmetry) , and {(j)m^ : m = 
1,2,..., du, m 1} are its partners. 

There holds an orthogonahty property for the basis functions (cf. [44]): if {(j)^^'^^ : I = 
1, 2, . . . , djy} and {'0;, • = 1, 2, . . . , d^'} are basis functions for irreducible representations 
r^*^) and r^'^'), respectively, then the scalar product 

(</.!'^\4"'))=5,,,<^,;,d;^^(</.M,V^M) V/G{l,2,...,d4, /'G{l,2,...,d,,}. (2.3) 

■m=l 

This equation implies that, two functions which belong to different irreducible representations 
or to different columns of the same unitary representation are orthogonal. What is more, the 
scalar product of two functions belonging to the same column of a given unitary representation 
(or adapting to the same symmetry) is independent of the column label. 

Multiply equation (2.2) by T^'''\R)*^,i, and sum over R, the great orthogonality theorem 
(2.1) implies that 

R&G 

Define for any m, / e {1, 2, . . . , d^} operator as 

^ ReG 

then we have 

^S</'r = 5..'5«'4^). (2.5) 

Proposition 2.1. Given v G {1,2, .. . ,?ic} oind k G {1,2, . . . ,dy}. If function v satisfies 
^^^^v ^ 0, then {.^^l^v : I = 1,2, . . . ,dy] form a basis for F^, namely, {^[I'v : I = 
1,2,..., di,} are non-zero functions, and for any I G {1, 2, . . . , d^}, 

Pr (^/r^^) = E ^^•'HRU V i? G G. (2.6) 

m=l 

Proof. For any Z G {1, 2, . . . , d^}, 



Pr [^^^-) = V E ^^'^iS)lPns v = ^E r^'\R-'S')lPs'V V G G, 

^ 5GG ^ 5'gG 

where PrPs = Prs because {Pr : R G G} form a group isomorphic to G. 
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Since T^'^^ is a unitary representation of G, we obtain 



?'GG \m=l / 



m=l 

Recall the way to achieve (2.5), we see from this equation and the great orthogonality 
theorem that 

^1^^ = ^i/^ (^/r^^) V/G{l,2,...,d.}. 
So ^^j}v 7^ indicates ^^^v 7^ for alH € {1, 2, . . . , This completes the proof. □ 
If we set v' = v^V = 1 and m = / in (2.5), then it gives 

^r^r^=^!^^- (2.7) 

Proposition 2.1 implies that (2.7) serves to characterize the labels of any basis function: 

Corollary 2.1. Given v G {1,2, ...,nc} and I G {1, 2, . . . , 6?,^}. Non-zero junction v 
belongs to the l-th column ofV^'^^ (or adapts to the v-l symmetry) if and only if 
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We will use the following properties of operator , whose proof is given in Appendix B. 
Proposition 2.2. (a) The adjoint of operator ^l^] 

(b ) The multiplication of two operators 

^l::>l::''l=Mm'^l::l. (2.10) 

2.2 Subproblems 

It can be seen from Corollary 2.1 and the linearity of operator ^ii'^ that, all functions in V 
belonging to the l-th column of T^'^^ (or adapting to the v-l symmetry) form a subspace of V. 

We denote this subspace by VJ^"^^ . 

There holds a decomposition theorem for any function in V (cf. [44]): any function f £ V 
can be decomposed into a sum of the form 

ric dy 

f = T.Y.f!^^^ (2-11) 

u=l 1=1 

where belongs to subspace vl''^\ We see from (2.5) and (2.11) that : V — VJ^^'' is a 

projection operator. Equation (2.11) implies 

ric dy 
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which indeed is a direct sum 

^ = ©©^^^^ (2-12) 

v=l 1=1 

due to (2.3). 

Now we turn to study the symmetry-based decomposition for eigenvalue problems. Suppose 
group G is a symmetry group associated with eigenvalue problem (1.1), namely 

m = n, PrL = lpr vi? e g, (2.13) 

and the subjected boundary condition is also invariant under {-Pr}. Then any i? S G is called 
a symmetry operation for problem (1.1). For simplicity, we take zero boundary condition as an 
example and discuss the decomposition of eigenvalue problem 

Lu = Xu in C M", , 

n an 2.14) 

u = on oU, 

Since Pr and L are commutative for any R in G, we have: 

Proposition 2.3. If function v G then Lv G where u G {l,2,...,nc} and I G 

{1,2, . . . ,di,}. In other words, V^''^^ is an invariant subspace of operator L. 

The direct sum decomposition of space V and Proposition 2.3 indicate a decomposition of 
the eigenvalue problem. 

Theorem 2.1. Suppose system (2.14) has a symmetry group G = {R} with a finite or- 
der. Denote all the inequivalent, irreducible, unitary representations of G as {T^'^^ : v = 
1,2,... ,nc}. Then the eigenvalue problem can be decomposed into subproblems. For 

any G {1,2, . . . ,nc}, the corresponding d^ subproblems are 



I 



ondn, / = 1,2,...,4, (2-15) 



u 



where k is any chosen number in {1,2, . . . ,di^}. 



Proof. It can be seen from (2.12) and Proposition 2.3 that, other than solving the eigenvalue 
problem in V, we can solve the problem in each subspace V^''^^ independently. The original 
eigenvalue problem (2.14) can be decomposed into d^ subproblems as follows 



u 



I 



ondn, u = l,2,...,nc, l = l,2,...,d^, (2.16) 

t^.;^) = inf], 

where the third equation characterizes u^'^^ G Vi^'^\ as indicated in Corollary 2.1. 

Given any v G {1, 2, . . . , nd, we consider the di, subproblems in (2.16). We shall prove that, 
for any chosen A;G{l,2,...,(ijy}, ifi) and w are two orthogonal eigenfunctions corresponding 
to some eigenvalue of the A;-th subproblem, then ^^^j^v and ^^ij^w are eigenfunctions of the 
l-ih. subproblem with the same eigenvalue, and are also orthogonal. 
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From Equation (2.5) and the fact that Pr and L are commutative for each R, we see S^^v 
is an eigenfunction of the /-th subproblem which corresponds to the same eigenvalue as the one 
for V. It remains to prove the orthogonality of ^'i^v and ^^'^w. Proposition 2.2 indicates 
that the scalar product of any two functions in v'^^ is invariant after operating on them with 
S^'^^ ■ Indeed, we have for any v,w £ V^'^^ that 

= {v,w), 

where the last equality can be obtained from Corollary 2.1. Thus v and !^^w are or- 
thogonal when V and w are. 

Recall L is Hermitian, we obtain that eigenvalues of the dy subproblems are the same, and 
eigenfunctions of the /-th subproblem can be chosen as {j^^^^}, where {v} are eigenfunctions 
of the fc-th subproblem, and k is any chosen number in {1, 2, . . . , dj^}. 

Therefore, the original eigenvalue problem (2.14) can be decomposed into di, subprob- 

lems, and for any G {1,2, . . . ,nc} the corresponding di, subproblems can be written in the 
form shown in (2.15). This completes the proof. □ 

The third equation of the subproblems in (2.15) are 

4"^^ = ^lk^^k^ y 1 = 1,2,. ..,d, ,i^k. 

We see from Proposition 2.1 that {u^'^^ : I = 1,2, . . . ,di^} form a basis for F^^-*. Namely, for 
any Z € {1,2, . . .,du} 

di, 



PruY^ (x) = (^)r^"^ {R)mi yR G G. 



m.=l 



Corollary 2.2. Under the same condition as in Theorem 2.1, eigenvalue problem (2.14) can 
he decomposed into YlZ=i'^i^ subproblems. For any v G {1, 2, . . . , nd, the corresponding dy 
subproblems can be given as follows 

( r„,H 



Luf'^ = \i-)uf^ mn. 



ui'^ = ondVL, l = l,2,...,dy. (2.17) 



The third equations in (2.15) and (2.17) play a critical role in the symmetry-based decom- 
position. The original eigenvalue problem can be decomposed into subproblems just because 
eigenfunctions of subproblems satisfy distinct equations. In the following text, we call these 
equations as symmetry characteristics. 
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Remark 2.1. Symmetry characteristics describe symmetry properties of eigenf unctions over 
domain Q. In [10], the author has given a decomposition formulation for boundary value prob- 
lems with spatial symmetries. Each decomposed problem is characterized by a "boundary con- 
dition" on symmetry element^ T,g, which is in fact a restriction of the symmetry characteristic 
on T,g. 

In our opinion, symmetry characteristics are essential conditions and should not be replaced 
by the restriction on symmetry elements. In some cases, indeed, boundary conditions such as 
Dirichlet or Neumann kind on can be deduced, while in the derivation one needs to use the 
symmetry characteristic near Tig, not only on it. In some other cases, symmetry characteristics 
may not derive proper boundary conditions. Therefore, boundary conditions are not sufficient 
for the well-posedness of subproblems. 

Let us compare eigenvalue structures of the original problem and subproblems. 

We assume for instance that all symmetries of the original problem form a D4 group. The 
D4 group have = 5 irreducible representations; one of them is two-dimensional and all others 
are one-dimensional [14, 44]. Then the eigenvalues are either nondegenerate or have double 
degeneracy, on the assumption that no accidental degeneracy occurs [44, 49]. According to 
Theorem 2.1, the original problem can be decomposed into Ylu=i^f ~ ^ subproblems. Any 
eigenvalue with double degeneracy is distributed to the two subproblems corresponding to u 
with = 2. So all eigenvalues of each subproblem should be nondegenerate. 

Actually, the third equations in subproblem formulations (2.15) and (2.17) imply a relation 
between symmetry and degeneracy. Indeed, for instance, [33, 37, 46] discussed this relation for 
eigenmodes of the Laplacian in two dimensions. In practice, we usually use some subgroup of 
the whole symmetry group. Thus subproblems will probably still have degenerate eigenvalues. 
However, there is a possibility to enlarge the gap of the spectrum, especially when we exploit as 
many symmetries as possible. This would benefit the convergence of iterative diagonalization. 

Formulation (2.17) makes a straightforward implementation for grid-based discretizations. 
We shall discuss the way to solve the subproblems in the next section. 

3 Discretization 

In this section, we represent subproblems (2.17) under grid-based discretizations. We present a 
mathematical guidance to the construction of SALCs in Proposition 3.1 and Remark 3.2. And 
then we illustrate the relation of our discretized systems to those formed by SALCs. 

Note that the d^ subproblems associated with different v values are independent and have 
the same formulations. So we take one v € {1,2, . . . ,nc} and discuss corresponding d^, sub- 
problems. 

3.1 Discretized system 

Suppose $7 is discretized by a symmetrical grid associated with group G, and the grid contains 
degrees of freedom. For simplicity we assume that no grid points lies on symmetry elements. 
We determine a smallest set of degrees of freedom that could produce all N ones by applying 
symmetry operators {R G G}. It is clear that number of degrees of freedom in this smallest set 

^ Symmetry element of operation J? is a point of reference about which R is carried out, such as a point 
to do inversion, a rotation axis, or a reflection plane. It is seen that symmetry element is invariant under the 
associated symmetry operation. 
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A'o = ^N. We denote them as 

{xj : j = 1,2,..., A^o}, 
then all degrees of freedom can be given by 

{R{j) : j = 1,2,..., No, ReG}, 

where R{j) = Rxj {j = l,2,..., Nq). 

The symmetry characteristic equation in (2.17) tells that for any I ^ {1,2, . . . ,di,}, uf'^ over 

Q. is determined by the values of functions {u~i \ . . . , u~^^} over Under discretization it is, for 

any / G {1, 2, . . . , dy], the values of uf^ on all degrees of freedom {R{j) : j = 1,2, . . . , Nq, R G 

G} are determined by the values of functions {u\'\ . . . , u"d^} on {j : j = 1,2, . . . , Nq}. Thus, 
the size of discretized eigenvalue problem for (2.17) is dyNQ. 

If the given irreducible representation T^'^^ is one-dimensional, then (2.17) gives 

'u(^) = ond^, (3.1) 

u^^^Rx) = rW(i?)* u(^)(x) mQ,yR, 

where we omit subscripts of r('^^(i?)ii and 

Suppose the discretized system for eigenvalue problem (3.1) are 

No No 

Yl Yl = '^'^"^ Yl Y Krw'^ru) V i = 1, 2, . . . , A^o- 

i=i ReG j=i rgg 

We know from the symmetry characteristic that the discretized system can be reduced to 
Y Y r^'^^(^)* = ^^'^ Y Y r^'^(^)* hnu^n, V ^ = 1,2, . . . ,iVo. 

j=l R£G j=l ReG 

Denote the solution vector as 

U = {ui,U2, . . .,UNoV- 

We write the discretized system into the matrix form 
where 



B = {Bij)NoxNo, Bij = Y^j^^f^T'^'^XR) b.^R^^y 



(3.2) 



In the case of higher-dimensional irreducible representations, the d^, subproblems in (2.17) 
are coupled through symmetry characteristics. 
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Taking d,^ = 2 as an example, we assemble subproblems for u^^^ and u^2^ in (2.17) to solve 
eigenvalue problem 







on dQ. 



Suppose the discretized system for (3.3) are 

{^f=l ^ReG C'i.RU) ^^i,-R(i) = ^^'^^ ^f=l ^ReG ^i,RU) ^-^,R(j) V Z = 1, 2, . . . , A^'o, 
Yl!j=l ^R£G ^i,R(i) '^2,R{3) = ^^^"^ Sj^l SflGG ^i,RU) '^2,RU) V Z = 1, 2, . . . , Nq. 

Denote the solution vector as 

V = (nii,-Ui2, . . . ,UiNo, U21,U22, ■ ■ ■ ,U2NoV 

and write the discretized system in a matrix form 
Then 

( ^[11] ^[12] A ( B[u] ^[12] 



12 ^i,R{i)^ 



22 '^i.RU)- 



A[2l] A[22] J ' V ^[21] ^[22] 



where A[^i] = {A[^i]ij)NQxNo {rn,l = 1,2) with 

= Efl,Gr(^)(^)na.«0), A[,2]^J=EReG'^^^'HR) 
Al21]ij = Y^ReG^^''H^)21 O-^Mj)^ ^[22]ij = EiJeGr^''H-R)22 



(3.4) 



(3.5) 



Matrix elements of B are in the same form as those of A and can be obtained by substituting 
^iMU) with 

Remark 3.1. If symmetry group G is Ahelian, each irreducible representation is one- dimensional 
and all discretized subproblems are independent. Otherwise, there exist T^'^^ with dy > 1 and the 
corresponding dy discretized subproblems are coupled through symmetry characteristics. Thus, 
no matter G is Abelian or not, we shall solve ric decoupled eigenvalue subproblems, where 
is the number of irreducible representations. And the size of discretized system for the u-th 
problem is d^N^. 



3.2 Finite element discretization 

In this part, taking the finite element discretization as an example, we study the relation 
between our discretized systems and those from constructing SALCs. We propose Proposition 
3.1 and Remark 3.2 to guide the construction of SALCs from mathematical viewpoint. 
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Suppose the weak form of (2.14) is: Find (A, u) € M x y such that 

a{u,v) = X{u,v) Vv G V. (3-6) 

Consider the finite element discretization, we denote the basis function corresponding to 
any j G {1, 2, . . . , A'o} as ipj. It can be seen from Pj^ipj{x) = (pj{R~^x) that Pripj is the basis 
function corresponding to R{j), i.e., 

In this case, our discretized systems are determined by setting Ui^ji^j) and in (3.2) and 

(3.5) as 

a.,ii(i) = "'iPB.Vj,Vi), Kb.u) = {PRfj,fi)- (3-7) 

We shall study the relation between our discretized systems with those from SALCs. First we 
discuss more about the construction of SALCs. 

For the given z/, we fix some k € {1, 2, . . . , d,^} and generate SALCs for the A;-th subproblem 

in (2.15). This is achieved by applying projection operator ^j^j} on all the finite element basis 
functions {PR(pj : j = 1, 2, . . . , A^O; R £ G}. Suppose we obtain A^' linearly independent basis 
functions from this process and we denote them as {^j : j = 1, 2, . . . , A^'}. Then for any 

/ G {1, 2, . . . , du}, SALCs for the l-th. subproblem can be given as {^Ik^^j : j = 1, 2, . . . , A^'}. 

Consider the dy discretized systems under the generated SALCs. Matrix elements of the 
l-th. discretized system are 

a(^i^)^„ : J = 1, 2, . . . , N'. 

For each j G {1, 2, . . . , A^'}, functions {^ik^'^j : I = 1,2, . . . ,d,y} form a basis for F^''). We see 
from (2.3) and Proposition 2.3 that all the d^ discretized systems are the same. So we only 
need to solve the discretized system corresponding to the k-th subproblem: 

N' N' 

J^a(M'j,M',)9 = A^'^) J^(M'j,M',)cj i = l,2,...,N'. (3.8) 
After calculating {cj}, the approximated eigenfunctions can be achieved by 

N' 

Now we give a result on how to determine the value of A^', and explain the specific way to 
obtain these functions. 

Proposition 3.1. Given v G {1, 2, . . . , nd and k G {1, 2, . . . , d^}. If j G {1, 2, . . . , Nq} satisfies 
that {PRipj : R G G} are linearly independent, then there are exactly dy linearly independent 
functions in {^j^^ Pjiipj : R G G}. 

Proof. For any R ^ G, since 

R'dG 

= ^^FH(5i?-)LP,^„ 
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we see that PrV^j is a linear combination of functions {Ps^j '■ S € G}, and the coefficient 

For the given j, {Ps^j '■ S € G} are hnearly independent. And runs over ah elements of 
group G when R does. So the number of linearly independent functions in {^^^ Pr'^j '■ R G G} 
equals to the rank of a matrix G = {Gmn)gxg, where 



Gmn — \PmRn)kk- 

It can be seen that Gmn is the multiplication of the k-th row of T^'^\R„ 
column of rH(iJ„)*. Then G can be written as 



and the fc-th 



G 



fTM(Ri)l, ... rW(i?i)L. \ 



V T^^^R. 



( rH(i?i) 
rH(iii);, 



'9)kl 



Ifc 



C1C2. 



Gi and C2 are g x and d^, x g matrices, respectively. We obtain from the great orthogonality 
theorem (2.1) that columns of Ci are orthogonal, and so are rows of G2, i.e., 

rank(Ci) = rank(C2) = d^- 



So 

rank(C) = rank(CiC2) = d^, 
and we have proved the proposition. 



□ 



Remark 3.2. For the given v and k, if j satisfies some condition, Proposition 3.1 indicates that 
there are dy linearly independent basis functions adapted to the u-k symmetry in {j^j^j} Pjupj : 
R € G}. It remains a problem how to obtain the d^ functions. We see from (3.9) that, 
whenever the chosen d^ operators {Rn (z G : n = l,...,di,} satisfy that the k-th columns of 



matrices {r('^)(i?„^) : n = l,...,di^} are linearly independent, 
exactly give the dy linearly independent functions. 



kk 



PRn'Pj ■■ n = l,...,du} 



In the case of diy = 1, we apply projection operator ^^'^^ on all the finite element basis 
functions to construct the SALCs. We see from Proposition 3.1 that for each j € {1, 2, . . . , Nq}, 
S^(^i'^) p^ip ■ : R g G} give one symmetry-adapted basis function. According to Remark 3.2, we 
can choose R = E to get all the Nq SALCs as 



- - '"Pj^ 3 
The discretized system under these SALCs are 



1,2,..., iVo. 



Equivalently, 
where 



Au = XBu, 



1,2, 



(3.10) 



(3.11) 
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It can be seen that 



Thus we get 



A = -A, B = -B. 
9 9 



A = A, u = li. (3.12) 

In the case of di, = 2, there are two subproblems in (2.15). We choose k = 1 and apply 
projection operator ^[i^ on all the finite element basis functions to construct SALCs for the 
first subproblem. Proposition 3.1 tells that for each j € {1,2,... ,A''o}, {^[^^ PRipj : R G G} 
give dp = 2 linearly independent symmetry-adapted basis functions. Based on Remark 3.2, we 
choose identity operation E and another S £ G which satisfy that the first columns of matrices 
|r('^)(^) ^ T^'^\S~^) } are linearly independent. Then 



give all the 2A'^o basis functions adapted to the u-l symmetry. We denote them as 

(cl>l, . . . , $;Vo, ^1, • • • , ^TVo) = (^liVl, • • • , ^liVNo, ^li^PsVl, ^ii^PsVNo)- 
The discretized system under these SALCs are 

f EfiiW^i,<^.)aj = AEfii{(^i,^*K + {^j,^^)bJ} V i = 1, 2, . . . , TVq, 

I Ef=iH^j,^i)aj + ^^)bj} = A Ef=i{i^j,^i)aj + {^J,^^)bJ} V i = 1, 2, . . . , iVo, 

(3.13) 

Equivalently, 

Av = ~xm, 

where 



V = (ai,a2, . . . ,aNo, bi,^2, • • • ,bNoV 

( ^[11] ^[12] \ 5[12] 

V ^[21] ^[22] / V -^[21] -^[22] 



It can be calculated that 



^[11] = -^[11], 

V] = ^(r('^^(5)iiAii]+r^'^(^)i2Ai2]) 
A21] = -(r('^H5)n^[ii]+r(^^(5);2V] 

9 

^[22] = ^{rM(5)ii(rM(5);iA[n]+rH(5);2^pi] 

+ rH(5)i2 (rW(5);,A[i2] +rH(5);2^[22]) }• 



Let 
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we have 



QlAQr 




Similarly 



QiBQr 




Thus we obtain 



A = A, V 



(3.14) 



I.e. 



uij = aj + T^''\S\^bj, U2j = T^''\S)^^bj, j = 1, 2, . . . 



Consider a given v € {1,2,..., nd, the method of constructing SALCs seems to have an 
obvious advantage that the dp subproblems are decoupled. Proposition 3.1 tells that the number 
of SALCs for each subproblem is in fact cI^Nq. Therefore, the coupled eigenvalue problem 
appeared in our decomposition approach is not an induced complexity, but some reflection of 
the intrinsic property of symmetry-based decomposition. 

Solving subproblems instead of the original eigenvalue problem shall reduce the computa- 
tional overhead and memory requirement to a large extent. The eigenvalues to be computed 
are distributed among subproblems, and the decomposed problems can be solved in a small 
subdomain. Moreover, as indicated in Section 2, there is a possibility to improve the spectral 
separation, which would accelerate convergence of iterative diagonalization. Later, we shall 
address some issues gained from a parallel implementation of the decomposition approach and 
propose a way to analyze the practical decrease in the computational cost. 

4 Complexity and performance analysis 

The distinguished advantage of solving subproblems (2.15) instead of the original problem (2.14) 
is the reduction in computational overhead. Based on a complexity analysis, we quantize this 
reduction and present a way to analyze the practical speedup in CPU time. 

4.1 Complexity analysis 

Computational complexity is the dominant part of computational overhead when the size of 
problem becomes sufficiently large. So the fundamental step of complexity analysis is to figure 
out the computational cost in floating point operations (flops). 

The algebraic eigenvalue problem will be solved by the implicitly restarted Lanczos method 
(IRLM) implemented in ARPACK package [34] . IRLM is the widely-used and practical variant 
of Krylov subspace iterative method for computing eigenvalues of large-scale Hermitian ma- 
trices. Our complexity analysis will be based on IRLM, whereas it can be extended to other 
iterative diagonalization methods. 

Total flops of the iterative method is the product of the number of iteration steps and 
number of flops per iteration. Since the number of iterations is problem dependent, we analyze 
the number of flops per iteration, for which purpose we represent the procedure of IRLM as 
Algorithm 4.1. 

Table 1 is a supplementary remark to Algorithm 4.1. In Algorithm 4.1, Step 2 is the Schur 
decomposition of Hm, and consumes about 6m? flops [24]. Steps 4 to 7 do /-step QR iteration 
with shifts. On the assumption that each Qj is the product of (m — 1) Givens transformations. 
Step 5 costs 8m (m — 1) flops since applying one Givens transformation to one matrix only 
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Algorithm 4.1: An implicitly restarted Lanczos method 



Input: Maximum number of iteration steps; The m-step Lanczos Factorization 
1 repeat 

Compute the Schur decomposition of the symmetric tridiagonal matrix Hm and 
select the set of / shifts /ii , ^2 1 • • • , ; 

for j = 1, 2, . . . , / do 
end 

fk ^ Vk+if3k + fmCTk; Vk ^ Vm{l : n, 1 : A;); Hk ^ i^m.(l : A;, 1 : A;); 
Beginning with the /c-step Lanczos factorization AV^ = VkH^ + e^/^, apply I 
additional steps of the Lanczos process to obtain a new m-step Lanczos factorization 

10 until Convergence or the number of iteration steps exceeded the maximum one; 



Table 1: Notations in Algorithm 4.1 



notation 


description 


m 


the maximum dimension of the Krylov subspace, 
twice as many as the number of required eigenvalues in our computations 


I 


the number of Lanczos factorization steps, s.t. m = k + 1 


N 


the number of DOFs 


A 


the (sparse) matrix size of N , 
arising from the grid-based discretization of (2.14) or (2.15) 


Vrn 


the matrix size of x ?7t,, 
made of m column vectors as the basis of the Krylov subspace 


Hm 


the symmetric tridiagonal matrix size of m 


fm 


the column vector size of N , 
the residual vector after m steps of Lanczos factorization 


Cm 


the m-length unit column vector which m-th component is one 


Rj 


the upper triangular matrix size of m 


Qj 


the unitary matrix size of m 


Vk+l 


the [k + l)-th column vector of Vm 




Hrn{k + l,k) 


CTk 


the A;-th component of q 
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change two rows or columns. And for the same reason, Step 6 costs 4(m — 1)(A^ + 1) flops. 
Consequently Steps 4 to 7 consumes 4Z(m — l)(2m + + 1) flops. Regardless of BLAS-1 
operations, we do / matrix-vector multiplication operations at Step 9. 

Besides matrix size N, the flops of one matrix-vector multiplication also depend on the 
order of finite difference or finite elements. If the shift-invert mode in ARPACK is applied to 
solve the general eigenvalue problems arising from the finite element discretization, the matrix- 
vector multiplication will be realized by some iterative linear solver. So we cannot figure out 
accurately the flops per matrix- vector multiplication but represent it as 0{N). 

In total, the computational overhead per IRLM iteration can be estimated as 



+ I [A{m - l)(2m + + 1) + 0{N)] 



flops. In general, the matrix size A^ is much more than m in either finite difference or finite 
element discretizations. So the majority of the flops per IRLM iteration is 



/(/, m,N) = l [4mA + 0{N)] . 



(4.1) 



In order to make clear the reduction in flops per iteration, we divide flops per iteration into 
two parts. One is required by /-step QR iteration, and the other is spent on I operations of 
matrix- vector multiplication. We denote them by /i and /2 respectively and rewrite (4.1) as 
follows 



/(/,m,A) = /i(/,?n,A) + /2(/,m,A), 
/i(/,m,A) = 4/mA, 
/2(Z,?n,A) ^ 0{lN). 



(4.2) 



In solving the original eigenvalue problem (2.14), the major flops per IRLM iteration can 
be accounted as (4.1) or (4.3). In solving one subproblem (2.15), m is reduced to m/di, N is 
reduced to N/g, I is replaced by I/O2, where g is given in (A.l), ^1 > 1 and 62 ~ ^1 as / is almost 
proportional to m in Algorithm 4.1. Due to Yl^=i ^-f subproblems in total and the identical 
9i for each subproblem (2.15), we see that the majority of total flops per iteration for all the 
subproblem is 



ripfi 



I m A, 



rir. 



J_ m A f (J_ 

J-f,{l,m,N) + ^f2{l,m,N) 

71(72 (72 



where the number of subproblems 



rir, 



(4.3) 



(4.4) 



As mentioned in Section 3, it can be qualitatively said that the decomposition approach 
saves the computational cost of solving the eigenvalue problem. Now the reduction can be 
quantized by (4.3). 



4.2 Performance analysis 

Since the order of factors for /i and /2 is not identical in (4.3), the practical speedup in CPU 
time cannot be properly quantized by (4.3). Thus, in solving the original eigenvalue problem 
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(2.14), we introduce the CPU time ratio oj of the matrix- vector multiphcations to the whole 
IRLM process. It is an a posteriori parameter which screens affects of implementation, runtime 
environment, as well as the specific linear solver for the shift-invert mode. Besides, testing for 
ijj is feasible as the operation of the matrix-vector multiplication is usually provided by users. 

Applying the symmetry-based decomposition approach instead of solving (2.14) directly, we 
can explain the speedup in CPU time of one iteration as follows: 



(4.5) 



np [1 + (^1 - l)w] 

gel 

Up [1 + {Oi - l)uo] ■ 



(4.6) 



In practice, 62 is actually determined by the internal configurations of algebraic eigenvalue 
solvers. So we prefer (4.6) and use it to predict the CPU time speedup before solving subprob- 
lems (2.15). 

In Section 6, the validation of (4.5) will be well supported by numerical experiments. More- 
over, this performance analysis implies that the speedup will be amplified while more required 
eigenvalues are wanted and a consequent grow in oj is very likely. Therefore, the symmetry-based 
decomposition will be attractive for large-scale eigenvalue problems. 



5 Practical issues 

In this section, we address some key issues on implementation. 
5.1 Implementation of symmetry characteristics 

Symmetry characteristics play a critical role in the decomposition approach, so it is important 
to preserve and realize symmetry characteristics for discretized eigenfunctions. 

For all the degrees of freedom not lying on symmetry elements, the implementation of 
symmetry characteristics is straightforward with grid discretization. If x € is a grid point 
lying on the symmetry element corresponding to operation R & G, the symmetry characteristic 

m=l 

reduces to ^ 

^!^)(x) = x;rH(i?)LnH(x). 

771=1 

If it holds that 

det(r('^)(i?)-/rf,xd.) /o, 

all values u^'^^\x), . . . ,u^^^[x) are zeros. Otherwise, we have to find the independent ones out 

of u^i\x), . . . , u^J^\x) and treat them as the other degrees of freedom. 

In our computation, we discretize the problem on a tensor-product grid associated with 
the symmetry group. Currently, for simplicity, we just avoid symmetry elements on coordinate 
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planes, by imposing an odd number of partition in each direction and using finite elements of 
odd orders. 



5.2 Distribution of required eigenvalues among subproblems 

The required eigenvalues of the original eigenvalue problem (2.14) are distributed among asso- 
ciated subproblems, and the number of eigenvalues required by each subproblem can be almost 
reduced by as many times as the number of subproblems. However, we cannot know in advance 
the symmetry properties of eigenfunctions corresponding to required eigenvalues. Thus we have 
to consider some redundant eigenvalues for each subproblems. 

We suppose to solve the first m eigenvalues of the original eigenvalue problem. First we 
set the number of eigenvalues to be computed for each subproblem as ■ plus redundant 

iria eigenvalues. After solving the subproblems, we gather the eigenvalues from all subproblems 
and sort them in the ascending order. After taking m smallest eigenvalues, we check which 
subproblems the remaining eigenvalues belong to. If there are no eigenvalues left for some 
subproblem, the number of computed eigenvalues for this subproblem is probably not enough. 
Subsequently we restart computing this subproblem with an increasing number of required 
eigenvalues. 



5.3 Two-level parallel implementation 

We have addressed in Section 3 that the Uc decomposed problems are independent to each 
other and can be solved simultaneously. Accordingly we have a two-level parallel implemen- 
tation illustrated by Figure 1. At the first level, subproblems are dispatched among groups 
of processors. At the second level, the irreducible grids are distributed among each group of 
processors. Since eigenfunctions of different subproblems are naturally orthogonal, there is no 
communication between different groups of processors during solving the eigenvalue problem. 
Such two-level or multi-level parallelism is likely appreciable for the architecture hierarchy of 
modern supercomputer. In Section 6.3, it will be shown that the two-level parallel implemen- 
tation reduces the communication cost. 



Solve the eigenvalue problem 
with symmetries 




Solve subproblems on 
a group of processors 



Solve subproblems on 
a group of processors 



Solve subproblems on 
a group of processors 



Solve subproblems on 
a group of processors 



CPU 1 



CPU P 



Distributed over 
irreducible grids 



CPU 1 



CPU P 



Distributed over 
irreducible grids 



CPU 1 



CPU P 



Distributed over 
irreducible grids 



CPU 1 



CPU P 



Distributed over 
irreducible grids 



Figure 1: Schematic illustration for two- level parallel implementation for solving the eigenvalue 
problem with point symmetries. Actually, the number of processors of each group can be in 
proportion to d^No, the size of discretized basis for each subproblem. 
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6 Numerical tests and applications 



In this section, we present some numerical examples arising from eigenvalue problems in quan- 
tum mechanics to validate the implementation and illustrate the efficiency of the decomposition 
approach. We use hexahedral finite elements for discretizations and consider crystallographic 
point groups of which symmetry operations keep the hexahedral grids invariant. The ma- 
trix eigenvalue problem is solved by subroutines of ARPACK. Our computing platform is the 
LSSC-III cluster provided by State Key Laboratory of Scientific and Engineering Computing 
(LSEC). 

6.1 Validation of implementation 

First we validate the implementation of the decomposition approach. Consider the harmonic 
oscillator equation which is a basic quantum eigenvalue problem as like: 

-^Au + ^\x\'^u = Xu in M^. (6.1) 

The exact eigenvalues are given as 

Xk,m,n = k + m + n + 1.5, k,m,n = 0,1,2, 

In this example, we solve the first 10 eigenvalues. The computation can be done in a finite 
domain with zero boundary condition since the eigenfunctions decay exponentially. 

It is obvious that the system is spherically symmetric. As representatives, we test Abelian 
subgroup and non- Abelian subgroups D4 and D2d. Table 2 gives the irreducible represen- 
tation matrices of these groups. We refer to Appendix A for a detailed description about the 
notation in the table. The hexahedral grids can be kept invariant under the three groups. 



Table 2: Representation matrices of point symmetry groups D2d, and D4 



D2h 




E 


C2:c 


C2c 


C2f 






IC2^ 


IC2f 


r(i) 




1 


1 


1 


1 


1 


1 


1 


1 


r(2) 




1 


1 


-1 


-1 


1 


1 


-1 


-1 


r(3) 




1 


-1 


1 


-1 


1 


-1 


1 


-1 


r(4) 




1 


-1 


-1 


1 


1 


-1 


-1 


1 


r(5) 




1 


1 


1 


1 


-1 


-1 


-1 


-1 


r(6) 




1 


1 


-1 


-1 


-1 


-1 


1 


1 


r(7) 




1 


-1 


1 


-1 


-1 


1 


-1 


1 


r(8) 




1 


-1 


-1 


1 


-1 


1 


1 


-1 


Dad 




E 


C2y 


IC4y 


ic-^ 


IC2:c 


IC2, 


C2c 


C'2d 


r{i) 




1 


1 


1 


1 


1 


1 


1 


1 


r(2) 




1 


1 


-1 


-1 


1 


1 


-1 


-1 


r(3) 




1 


1 


1 


1 


-1 


-1 


-1 


-1 


r(4) 




















r(5) 




'n 


( V ' ) 


( : ) 


( ; ■'-«■ ) 


( -f ? ) 


( : ) 


(:';,) 






(;. 


















D4 




E 


C2y 






C2^ 




C2c 


^2d 


r(i) 




1 


1 


1 


1 


1 


1 


1 


1 


r(2) 




1 


1 


-1 


-1 


1 


1 


-1 


-1 


r(3) 




1 


1 


1 


1 


-1 


-1 


-1 


-1 


r(4) 




















r(5) 




'■;) 




( ? "-.! ) 


( °' ) 


( : ) 


( -f ) 


( ';';,) 






(;. 



















According to Theorem 2.1, we can decompose the origianl eigenvalue problem (6.1) as 
follows: 

1. Applying D2d, we have 8 completely decoupled subproblems; 
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2. Applying D4 or D2d, we have 6 subproblems and two of them corresponding to represen- 
tation r(^) are coupled eigenvalue problems. 

We employ trilinear finite elements to solve these eigenvalue subproblems, and tell from 
the convergence rate of eigenvalues that the implementation is correct. Taking non-Abelian 
group D4 for instance, we exhibit in Figure 2 errors in eigenvalue approximations obtained 
from solving the subproblems. And the /i^-convergence rate can be observed. 

Moreover, in Table 3, we list the u-l symmetries of computed eigenfuntions from solving the 
subproblems. It can be seen that degenerate eigenvalues are distributed over subproblems. 




Figure 2: Error in the eigenvalue approximations from solving 6 subproblems associated with 
non-Abelian group D4. Errors in the first three different eigenvalues are labeled as ei, e^, 65 
respectively. The h'^ -convergence rate can be observed. 



6.2 Reduction in computational cost 

Taking Abelian group D2/1 as an example, we compare the computational cost of solving the 
original eigenvalue problem (6.1) with that of solving 8 subproblems. The first 110 eigenvalues 
of the original eigenvalue problem are needed. And it is sufficient to solve the first 22 eigenvalues 
of each subproblem. In order to illustrate and analyze the saving in computational cost, we 
launch the tests on a single CPU core. 

In Table 4, we present statistics from trilinear finite element discretizations. The average 
CPU time of a single iteration during solving the original problem (6.1) is 42.29 seconds while 
that of solving 8 subproblems is 3.61 seconds ^. In Table 5, we present statistics from tricubic 
finite elements. The average CPU time of a single iteration during solving the original problem 
(6.1) is 60.80 seconds while that of solving 8 subproblems is 9.71 seconds. 

^ We count the average CPU time of a single iteration for each subproblem and then accumulate them. Taking 
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Table 3: The v-l symmetries of the first 10 eigenfunctions of problem (6.1) 





U 


2h 


U 


4 






V 


1 


u 


; 


7 / 7 




-1 
i 


-1 
i 


-1 

i 


i 


i i 




o 



1 


6 


1 


5 1 


Us 


a 



1 
1 


c; 



1 
1 


Z 




7 


1 


5 


2 


2 1 


Ms 


3 


1 


5 


1 


4 1 


Me 


4 


1 


5 


2 


5 1 


M7 


2 


1 


4 


1 


5 2 


Us 


1 


1 


1 


1 


1 1 


Ug 


1 


1 


2 


1 


2 1 


UlO 


1 


1 


1 


1 


1 1 



Table 4: Statistics of solving the original problem (6.1) and 8 subproblems using trilinear finite 
elements. In Column 1, subproblems are labeled by different us. In Columns 2 and 3, we list 
the number of major iteration steps and matrix-vector multiplications. In Columns 4 and 5, we 
present CPU time spent on matrix-vector multiplications and the whole procedure of IRLM. 



Problem 


#iter. 


#OP*x 


time_mv (sec.) 


time_total (sec.) 


(6.1) 


22 


1599 


175.01 


930.39 


1/ = 1 


18 


356 


5.13 


8.21 


1/ = 2 


22 


420 


6.16 


9.95 


1/ = 3 


22 


421 


6.06 


9.83 


u = A 


21 


406 


5.84 


9.46 


1/ = 5 


18 


353 


5.06 


8.23 


z/ = 6 


21 


405 


5.74 


9.44 


1/ = 7 


20 


389 


5.51 


9.03 


u = 8 


21 


403 


5.75 


9.34 



Table 5: Statistics of solving (6.1) and 8 subproblems using tricubic finite elements. 



Problem 


#iter. 


#OP*x 


time_mv (sec.) 


time_total (sec.) 


(6.1) 


57 


3972 


1696.29 


3465.57 


u = 1 


50 


937 


55.15 


62.75 


v = 2 


64 


1156 


67.75 


77.11 


1/ = 3 


64 


1153 


67.57 


77.09 


1/ = 4 


62 


1128 


66.05 


75.25 


1/ = 5 


47 


892 


52.18 


59.31 


z/ = 6 


69 


1215 


71.15 


81.37 


1/ = 7 


63 


1134 


66.74 


76.12 


1/ = 8 


70 


1230 


72.46 


82.87 
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The speedup in average CPU time of a single iteration is 11.72 with trilinear finite elements 
while it is decreased to 6.26 with tricubic finite elements. This numerical phenomenon can 
be explained by performance analysis (4.6). In our computation, the maximum dimension of 
Krylov subspace is twice the number of required eigenvalues plus 5, which is recommended by 
ARPACK's tutorial examples. So we have 9i = 4.59. It can be calculated from the statistics 
of solving the original problem that the CPU time percentage u of matrix-vector multipli- 
cations is 0.19 with trilinear finite elements and grows to 0.59 with tricubic finite elements. 
Correspondingly, using (4.6), we predict that the CPU time speedup would be 12.58 and 7.64 
respectively. 

According to (4.3), the computational cost of Q-R-iteration grows faster than that of matrix- 
vector multiplication when the number of required eigenvalues increases. Thus we can see that 
the decomposition approach would be more appreciable for large-scale eigenvalue problems. 

6.3 Saving in communication 

Besides the reduction in computational cost, solving decoupled problems will also save com- 
munication among parallel processors. As mentioned in Section 5.3, our implementation of the 
decomposition approach is parallelized in two levels. The DOFs are distributed in each group 
of processors and no communication occurs between any two groups of processors. This leads 
to a saving in communication. 

We still consider the oscillator eigenvalue problem (6.1) and decompose it into 8 subproblems 
according to symmetry group D2h- The comparison of communication between solving the 
original problem and the subproblems is given in Table 6. 

Table 6: Comparison of communication between solving (6.1) and 8 subproblems. Column 1 
gives the number of processors Np. In the other columns, "use symm" corresponds to solving 
the subproblems and "not use" corresponds to solving the original eigenvalue problem. Columns 
2 and 3 give the average number of processors each processor communicates with. Columns 4 
and 5 list the average number of Bytes sent by each processor. And the last two columns report 
the CPU time spent on communication during matrix-vector multiplications. 





Np in comm 


Bytes in 


comm 


CPU time 


in comm (sec.) 


use symm not use 


use symm 


not use 


use symm 


not use 


8 


0.00 1.75 





134,560 


0.00 


8.93 


16 


1.00 1.88 


19,608 


145,451 


0.20 


10.36 



6.4 Applications to Kohn— Sham equations 

Now we apply the decomposition approach to electronic structure calculations of symmetric 
molecules. In the context of density functional theory (DFT), ground state properties of many- 
electron systems are usually obtained by solving the Kohn-Sham equation [27, 31, 35]. It is a 
nonlinear eigenvalue problem as follows 

(^-Ia + U^^]^ ^„ = e„^„ inRS, (6.2) 

Table 4 for example, we have that 3.61 = Mi + M5+2^ + 9^ + S^ + Mi + 2^ + M4 
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where p{r) = X]^=i charge density contributed by A^e eigenfunctions {^'n} 
with occupancy numbers {/n}, and F [p] the so-cahed effective potential which is a nonhnear 
functional of p. On the assumption of no external fields, y^^[/9] can be written into 

yeff ^ yne + yH ^ ^xc^ (g 3) 

where V^^ is the Coulomb potential between the nuclei and the electrons, the Hartree 
potential, and V-^^ the exchange-correlation potential [35]. The ground state density of a 
confined system decays exponentially [2, 23, 41], so we choose the computational domain as an 
appropriate cube and impose zero boundary condition. 

As a nonlinear eigenvalue problem, Kohn-Sham equation (6.2) is solved by the self-consistent 
field (SCF) iteration [35]. The dominant part of computation is the repeated solving of the 
linearized Kohn-Sham equation with a fixed effective potential. The number of required eigen- 
states grows in proportion to the number of valence electrons in the system. Therefore the 
Kohn-Sham equation solver will probably make the performance bottleneck for large-scale 
DFT calculations. 

Real-space discretization methods are attractive for confined systems since they allow a 
natural imposition of the zero boundary condition [6, 32]. Among real-space mesh techniques, 
the finite element method keeps both locality and the variational property, and has been applied 
to electronic structure calculations (see, e.g., [1, 18, 19, 39, 40, 42, 43, 47, 48, 51]); others like 
the finite difference method, finite volume method and the wavelet approach have also shown 
the potential in this field [13, 16, 22, 26, 29, 32, 38]. 

We solve the Kohn-Sham equation of some symmetric molecules with tricubic finite ele- 
ment discretizations. The statistics are summarized in Table 7. The full symmetry group of 
these molecules is T^. For simplicity we select subgroup D2. Accordingly, the Kohn-Sham 
equation can be decomposed into 4 decoupled subproblems. It is indicated by Table 7 that the 
decomposition approach is appreciable for large-scale symmetry molecular systems. 

Table 7: Comparison between solving the original Kohn-Sham equation and subproblems. 
Column 3 gives the number of required eigenstates. The number of DOFs given in Columns 
4 and 5 is required by the convergence of ground state energy [18]. Columns 7 and 8 list the 
average CPU time in diagonalization at each SCF iteration step, which is the dominant part of 
time. The last column is the speedup of the decomposition approach. 



System 


G 




DOFs 
not use use symm 


Np 


CPU time in diag. (sec.) 
not use use symm 


Speedup 


C123H100 


D2 


300 


1,191,016 297,754 


32 


2,783 558 


4.99 


C275H172 


D2 


640 


1,643,032 410,758 


32 


13,851 1,559 


8.88 


C525H276 


D2 


1200 


2,097,152 524,288 


64 


25,296 2,334 


10.84 



7 Concluding remarks 

In this paper, we propose a decomposition approach to eigenvalue problems with spatial sym- 
metries. Different from the classical treating of symmetries in quantum chemistry, our approach 
does not explicitly construct the symmetry-adapted linear combinations (SALCs). Instead, we 
formulate a set of eigenvalue subproblems easy for grid-based discretizations. Moreover, we 
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provide mathematical guidance to the construction of SALCs and obtain the exact relation 
between the two approaches. 

Such a decomposition can reduce the computational cost remarkably since a smaller number 
of eigenpairs are solved for each subproblem. The quantization of this reduction implies that 
our approach could be appreciable for large-scale eigenvalue problems. In practice, we solve 
a sufficient number of redundant eigenpairs for each subproblem in order not to miss any 
eigenpairs. It would be very helpful for reducing the extra work if one could predict the 
distribution of eigenpairs among subproblems. 

Under finite element discretizations, this decomposition method has been applied to Kohn- 
Sham equations of symmetric molecules. If solving Kohn-Sham equations of periodic crystals, 
we should consider plane wave expansion which could be regarded as grid-based discretization 
in reciprocal space. In Appendix C, we will show that the invariance under some coordinate 
transformation can be kept by Fourier transformation. So the decomposition approach is also 
applicable to plane waves. 

In numerical examples, we treat only a part of cubic symmetries for validation and illustra- 
tion. Obviously, the decomposition approach and its practical issues can be adapted to other 
spatial symmetries with appropriate grids. 

In this paper, we concentrate on spatial symmetries. It is possible to use other symmetries 
in some specific areas. For instance, [20, 36] exploit the angular momentum, spin and parity 
symmetries of atoms during solving the Schrodinger equation. It is our ongoing work to exploit 
these underlying or internal symmetries. 
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Appendix A: Basic concept of group theory 

In this appendix, we include some basic concepts of group theory for a more self-contained 
exposition. They could be found in standard textbooks like [30, 44, 49]. 

A group G is a set of elements {R} with a well-defined multiplication operation which satisfy 
several requirements: 

1. The set is closed under the multiplication. 

2. The associative law holds. 

3. There exists a unit element E such that ER = RE = R for any R ^ G. 

4. There is an inverse in G to each element R such that RR^^ = R^^R = E. 

Group G is called finite if it contains a finite number of elements. And this number, denoted 
by g, is said to be the order of the group. The rearrangement theorem tells that the elements 
of G are only rearranged by mutiplying each by any R € G, i.e., RG = G for any R ^ G. 

An element i?i € G is called to be conjugate to R2 if R2 = SRiS~^, where S is some 
element in the group. All the mutually conjugate elements form a class of elements. It can be 
proved that group G can be divided into distint classes. Denote the number of classes as iic- 

Two groups is called to be homomorphic if there exists a correspondence between the el- 
ements of the two groups as R o R'^, R2, ■ ■ ., which means that if RS = T then the product 



24 



of any R[ with any S'^ will be a member of the set {T[,T2, . . .}. In general, a homomorphism 
is a many-to-one correspondence. It specializes to an isomorphism if the correspondence is 
one-to-one. 

A representation of a group is any group of mathematical entities which is homomorphic to 
the original group. We restrict the discussion to matrix representations. Any matrices repre- 
sentation with nonvanishing determinants is equivalent to a representation by unitary matrices. 
Two representations are said to be equivalent if they are associated by a similarity transforma- 
tion. If a representation can not be equivalent to representations of lower dimensionality, it is 
called irreducible. 

The number of all the inequivalent, irreducible, unitary representations is equal to the 
number of classes in G. The Celebrated Theorem tells that 

Y.dl = g, (A.l) 

where du denotes the dimensionality of the z^-th representation. 

Finally we introduce some crystallographic point groups used in our examples. Groups T>2h, 
Y)2d a-iid D4 are three dihedral groups; the first one is Abelian and the other two are non- 
Abelian. In Table 2, Cnj denotes a rotation about axis Oj by 27r/n in the right-hand screw 
sense and I is the inversion operator [14]. 



Appendix B: Proof of Proposition 2.2 

Proof, (a) Since {Pr} are unitary operators, we have 

\ ^ ReG / ^ R€G ^ SeG 

which together with the fact that T^'^^ is a unitary representation derive 

^':r = '-^T.^'^\s)iPs = ^l::!. 

(b) According to the definition 

V ^ ReG / \ ^ S€G / 

^ ReG VseG / 

Note that the rearrangement theorem implies that, when S runs over all the group elements, 
S' = RS for any R also runs over all the elements. Hence we have 

^ ReG \s'eG / 

= ^E fEr^'''(^)wr(^'^(^"^5');.^)^'5'. 

^ S'eG \ReG / 
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We may calculate as follows 

ReG ReG \n=l 

R€G \n=l 
n=l \R€G ) 

where the last equality is derived from the great orthogonality theorem. Thus we arrive at 



□ 



9 

^ S'6G 



Appendix C: Spatial symmetry in reciprocal space 

Plane wave method is widely used for solving the Kohn~Sham equations of crystals. Actually, 
plane waves are regarded as grid-based discretizations in reciprocal space. We will show that the 
symmetry equation in real space is kept in reciprocal space. The solution domain ^ of crystals 
can be spanned by three lattice vectors in real space. On the assumption that a function / 
is invariant with integer multiple translations of the lattice vectors, we present the function in 
reciprocal space as like: 

/(q) = ^E/W^"''^ (c.i) 

r 

where A'^ is the number of grid points. If / is kept invariant with the coordinate transform R 
in f2, then 

r 

= ^E/w--^'^-^^'^^) 

r 

= /(q). 

The above second equation holds since the coordinate transformation R can be represented as 
an orthogonal matrix. And the above third one holds due to the assumption that 

f{Rr) = f{r), Vrea 

Thus the decomposition method is probably applicable to plane waves. 
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